home *** CD-ROM | disk | FTP | other *** search
- SIMSTAT 1.00c
-
- August 9, 1992
-
-
- Designed and written by Normand Peladeau
-
-
- Copyright (C) 1991,92, N. Peladeau
-
-
-
- FILES YOU SHOULD HAVE
- ---------------------
-
- Before you start working with SIMSTAT, take a few moments to
- check the content of the archived file. The program should include 12
- files:
-
- 4 text files
- READ.ME -- This file.
- ORDER.FRM -- Use this file to order copies of Simstat.
- LISENCE.DOC -- Lisence and warranty information.
- VENDOR.DOC -- Information for shareware vendors.
-
- 3 files for the main program
- SIMSTAT.EXE -- The simstat statistical program.
- SIMSTAT.DEF -- The simstat default configuration file.
- SIMSTAT.HLP -- The simstat help file.
-
- 4 sample data files
- SAMPLE.DAT -- A sample ASCII data file.
- SAMPLE.SYS -- A SPSS/PC+ system file.
- SAMPLE.DBF -- A dBASE III data file
- SAMPLE.WKS -- A Lotus 1-2-3 data file.
-
- 1 utility file
- EXPAND.EXE -- A program to uncompressed listing files
-
-
- INSTALLING AND RUNNING THE PROGRAM
- ----------------------------------
-
- To install the program, simply copy all the files to the
- destination disk or directory.
-
- There are three command line options that can be used with
- SIMSTAT:
-
- /M or -M Force monochrome color set on a computer with a color
- card.
-
- /E or -E Display 43 lines on an EGA or 50 lines on a VGA
- monitor.
-
- /C or -C Save the listing file in a compressed format. This
- option is useful for saving disk space when running on
- a laptop. The listing file will take up to 75% less
- disk space. You must use the EXPAND.EXE utility to
- uncompress the listing file.
-
- PROGRAM MAINS FEATURES
- ----------------------
-
- SIMSTAT is a menu driven statistical program that provide many
- basic descriptive and comparative statistics including:
- o Summary statistics (mean, variance, standard deviation, etc.)
- o Crosstabulation
- - normal crosstabulation and inter-raters agreement table
- - nominal statistics including:
- - chi-square
- - Pearson's Phi
- - Goodman-Kruskall's Gamma
- - Contingency coefficient
- - ordinal statistics
- - Kendall's tau-b
- - Kendall's tau-c
- - Pearson's R
- - Symetric and asymetric Somers' D, Dxy and Dyx
- - inter-raters agreement statistics including:
- - percentage of agreement
- - Cohen's Kappa
- - Scott's Pi
- - Krippendorf's r
- - Krippendorf's R-bar
- - free marginal correction for nominal and ordinal measure
- o Frequencies analysis including:
- - frequencies table
- - barchart
- - histogram
- - descriptive statistics
- - percentile table
- o Breakdown analysis
- o Oneway analysis of variance
- o Paired and independent sample t-tests
- o Pearson correlation matrix, covariance and cross product
- deviation
- o Regression analysis including:
- - Linear and 7 nonlinear regressions including:
- - quadratic
- - cubic
- - 4th degree polynomial
- - 5th degree polynomial
- - logarithmic
- - exponential
- - inverse
- - X and Y scatterplot
- - regression equation
- - analysis of variance
- - residuals plot
- o Nonparametric analysis including:
- - Mann-Whitney U test
- - Wilcoxon T-test
- - Sign test
- - Kruskall-Wallis ANOVA
- - Kolmogorov-Smirnov test for 2 samples
- - Moses test of extreme reactions
- - Median test (2 or more samples)
- o nonparametric association matrix including
- - Spearman's R
- - Sommer's D, Dxy and Dyx
- - Goodman Kruskall's Gamma
- - Kendall's Tau-a, Tau-b
- - Kendall Stuart's Tau-c
-
-
- BOOTSTRAP ANALYSIS
- ------------------
-
- SIMSTAT also gives the user access to an innovative and extremely
- powerful statistical technique called bootstrap simulation. This
- technique developed by Efron (Efron, 1981; Diaconis & Efron, 1983) can
- be used to assess various properties of statistical estimators such as
- their accuracy, their sampling variability, etc.. Typical
- applications include the computation of nonparametric estimates of
- sampling distributions, the assessment of the stability of statistical
- estimators and the construction of nonparametric confidence intervals.
- SIMSTAT also allows the computation of nonparametric power estimates
- and Type I error rates for various estimators.
-
- The following section provides a short non-technical introduction
- to the bootstrap technique followed by a description of SIMSTAT
- particular implementation of bootsrapping methodology. Potential
- applications for researchers, statistical consultants and for students
- and teachers in statistics are also presented. For further
- information about bootstrap methods and its application you can read
- the articles of Efron and his colleagues (Diaconis & Efron, 1983;
- Efron, 1981; Efron & Gong, 1983). Wasserman and Bockenhold (1989)
- also provide an excellent introduction to bootstrap methodology, while
- Stine (1989) offers an comprehensive presentation of it's potential
- application.
-
- WHAT IS BOOTSTRAP SIMULATION?
-
- Bootstrap simulation is a resampling technique whereby initial
- sample subjects are treated as if they constitute the population under
- study. By replicating those data an infinite number of time, we then
- draw at random from that population a large number of samples, each
- the same size as the original sample. By computing, for every
- bootstrap sample, a statistical estimator of interest (such as a mean
- or a correlation between two variables), this resampling procedure
- recreates an empirical sampling distribution of this statistics.
-
- The main advantage of such a procedure is that the sampling
- distribution is not mathematically estimated but empirically
- reconstructed based on all the original characteristics of the data.
- So, it automatically takes into account distribution properties that
- are generally considered as contaminating factors, such as skewness,
- ceiling effects, outliers, etc. This feature makes bootstrap
- estimations adequate even when data are not normally distributed. In
- fact, bootstrapping can even be used to describe the sampling
- distribution of estimators for which sampling properties are unknown
- or unavailable.
-
- SIMSTAT IMPLEMENTATION OF BOOTSTAPPING
-
- SIMSTAT provides bootstrap analysis for seven descriptive
- estimators of a single variable and twenty estimators involving two
- variables. Those estimators are:
-
- One variable estimators:
- - Mean
- - Median
- - Variance
- - Standard deviation
- - Standard error
- - Skewness
- - Kurtosis
-
- Two variables estimators:
- - Kendall's Tau-A and B
- - Kendall-Stuart's Tau-C
- - Symmetric and asymmetric Somers' D
- - Goodman-Kruskal's Gamma
- - Student's t and F
- - Pearson's r
- - Spearman's R
- - Regression slope and intercept
- - Mann-Whitney's U
- - Wilcoxon's W
- - Difference between means
- - Difference between variances
- - Sign test
- - Kruskal-Wallis ANOVA
- - Median test
-
- The number of bootstrap samples for a single analysis can range
- from 100 to 20,000. The output of a simulation analysis can consist
- of various results, including descriptive statistics, frequency
- tables, histograms and percentile tables. The program also computes
- bootstrap confidence intervals.
-
- For estimators which can be tested for significance, SIMSTAT also
- displays nonparametric power estimates for up to four alpha levels.
- Power estimation with the bootstrap technique is straightforward:
- while performing bootstrap on a given data set, the proportion of
- redrawn samples that lead to a statistically significant estimator (at
- some given alpha level) is computed and used as a power estimate. In
- addition to simulation results, the program displays the value of the
- seed used to initialize the random number generator. This value may
- then be used to regenerate the same data at a later time or to compare
- various estimators using the same bootstrap samples.
-
- EXTENSIONS TO BOOTSTRAP
-
- To achieve an even greater range of potential application SIMSTAT
- implements two extensions to standard bootstrap simulation.
-
- 1) Variable sample size
-
- One typical aspect of bootstrap simulation is that it generally
- involves redrawn samples of the same size as the original one. However
- (de son cot ), SIMSTAT offers the possibility to modify the dimension
- of the bootstrap samples, thus allowing to compare estimator
- distributions obtained from different sample sizes. The user can set
- bootstrap simulations involving sample sizes that range from 10 to
- 20,000 observations.
-
- 2) Randow sampling
-
- Another aspect of bootstrapping is that it assumes that the
- original sample is representative of the population. SIMSTAT offers a
- modified bootstrap sampling process that makes the null assumption
- that there is no difference or relation in the population. While in
- bootstrap sampling the drawing is achieved on subjects, the RANDOM
- procedure extracts the data for each variable independently.
- Consequently, while a standard bootstrap simulation on a correlation
- between two variables would yield coefficients that fluctuate around
- the correlation that exists in the original sample, the RANDOM
- procedure would produce correlations that vary around a null
- correlation. In this procedure, the proportion of redrawn samples
- that lead to a statistically significant estimator at a given alpha
- level are use to assess the type I error rate.
-
- BOOTSTRAP APPLICATIONS
-
- We have already seen that standard bootstrap resampling can be
- use to obtain various measure of sampling variability such as
- nonparametric confidence intervals. The capability to alter the
- bootstrap sample size and to replicate the condition of the null
- hypothesis also establish numerous new applications. The following
- topic gives some examples of such applications.
-
- 1) Research planning - Power estimation
-
- The possibility to compare various estimator distributions
- obtained for different sample sizes can prove useful in planning
- research by allowing the researcher to determine the sample size
- needed to achieve a desired precision level. It can also be used for
- power estimation allowing comparison of the power attained using
- various estimators and/or sample sizes. Researchers thus have an
- empirical basis for choosing between two different statistical
- strategies. In addition, unlike standard approaches to power
- estimation, which rely on numerous assumptions, including normal data
- distributions, bootstrap power estimates makes no distribution
- assumptions.
-
- 2) Teaching Tool
-
- As a teaching tool, bootstrap simulation would be effective in
- illustrating to new stats students concepts such as sampling theory or
- central limit theorem. It would provide a simulation of the sampling
- process of an experiment, allowing the students to visualize the
- sampling variability of given estimators. By increasing or decreasing
- sample size, the student can observe how these changes affect the
- variability of estimators or the statistical power of an experiment.
- Additionally, bootstrap would be effective in demonstrating how
- outliers can affect estimation and how data transformation can improve
- population estimates.
-
- 3) Monte Carlo investigations
-
- Bootstrap might also be handy for the researcher interested in
- studying the effect of violation of the normality assumption on some
- estimators by allowing the evaluation of the Type I and Type II
- (statistical power) error rate of a test. While Monte Carlo
- simulations usually analyze data generated by assumed mathematical
- functions, bootstrap simulation provides a direct assessment of sample
- distributions from data provided by the researcher. By performing
- simulation on data distributions more representative of real world
- data, bootstrap may therefore be a more appropriate evaluation of
- statistical robustness.
-
-
- BOOTSTRAP REFERENCES
-
- DIACONIS, P., & EFRON, S. (1983, May). Computer intensive methods in
- statistics. SCIENTIFIC AMERICAN, 116-130.
-
- EFRON, B. (1981). Nonparametric estimates of standard error: The
- jackknife, the bootstrap, and other resampling methods.
- BIOMETRIKA, 68, 589-599.
-
- EFRON, B., & GONG, G. (1983). A leisurely look at the bootstrap, the
- jackknife and cross-validation. AMERICAN STATISTICIAN, 37,
- 36-48.
-
- STINE, R. (1989). An introduction to bootstrap methods: Examples and
- ideas. SOCIOLOGICAL METHODS AND RESEARCH, 8(2&3), 243-290.
-
- WASSERMAN,
-
-
- INPUT AND OUTPUT
- ----------------
-
- The data may be entered directly from the keyboard or read
- directly from a dBase file (version III or IV), a LOTUS 1-2-3 file,
- a SPSS/PC+ file or from a ASCII data file. The keyboard entry may be
- saved for later analysis.
- The output can be read on the screen, save on disk in a listing
- file, and/or send directly to the printer.
-
-
- CAPABILITY
- ----------
-
- The program can handle up to 500 variables and 20,000 cases. The
- simulation can contain between 100 and 20,000 sub-sampling. These
- limitations are the absolute maximum and can be somewhat lower
- depending
- on the amount of memory available.
-
-
- SYSTEM REQUIREMENTS
- -------------------
-
- SIMSTAT will run on any IBM PC/XT, AT, PS/2 and compatible under
- MS-DOS/PC-DOS version 2.0 or higher. A minimum of 356K of free RAM is
- necessary.
-
- The program does not need a numeric coprocessor but will use it
- if available. A coprocessor is highly recommended for extensive
- bootstrap simulation or computation on large samples.
-
- SIMSTAT take less than 120K of disk space (including the help
- file). It can easily be run on a system with a single 360K disk drive
- or on a LAPTOP computer.
-
-
- CREDIT
- ------
-
- IBM-PC/XT, AT and PS/2, PC-DOS are trademarks of International
- Business Machines
-
- MS-DOS is a registered trademark of Microsoft Corporation.
-
- SPSS/PC+ is a registered trademark of SPSS Inc.
-
- DBASE III and IV are trademarks of Ashton-Tate.
-
- LOTUS is a trademark of Lotus Corp.
-
-
- RELEASE HISTORY
- ---------------
-
- 1.00c 09-08-92
-
- - Fixed "Invalid printer port" error on 486 computers.
- (I/O OPTIONS).
- - Fixed problem with analysis on more that 16,384 valid cases.
- - Fixed floating point error when performing a logarithmic
- regression with a value of zero (REGRESSION)
-
- 1.00a 06-19-92
-
- NEW FEATURES AND IMPROVEMENTS
- -----------------------------
- - Added bootstrap simulation analysis for 7 descriptives
- statistics and 20 bivariate statistics.
- - More powerful case selection. A single string of up to 250
- characters can now be used to select cases. It may consist
- of a simple expression or include many expressions related
- by logical operators (AND, OR, XOR). Multiple parentheses
- level can be used to control the order in which expressions
- are evaluated.
- - Added 6 new measures of inter-rater agreement (CROSSTAB)
- o Scott's pi (nominal)
- o Adjusted Kappa (nominal)
- o Krippendorf's r (ordinal)
- o Krippendorf's R bar (ordinal)
- o free marginals adjustement (nominal and ordinal).
- - Added ability displays a listing of the values of the
- selected dependent and independent variables.
- - Improved memory management that gives up to 64k more ram
- for statistical analysis.
- - Improved monochrome color scheme.
- - Added user defined page header.
- - Improved algorithm for automatic histogram scaling when a
- normal curve is superimposed (FREQUENCIES).
- - More consistant use of the keyboard keys for the CHOOSE X-Y
- command (pressing the escape key will cancel the operation
- and restore previous variable definitions while the F10 key
- is use to accept the data definition)
-
- FIXED PROBLEMS
- --------------
- - fixed problem with the computation of Spearman's R (NPAR
- MATRIX).
- - fixed problem with the output of variance and standard
- error in analysis of variance (ONEWAY).
- - fixed problem with histogram output with a normal curve
- (FREQUENCIES).
- - fixed floating point overflow error (runtime error 205) in
- the computation of Kolmogorov-Smirnov and chi-square
- probability (KOLMOGOROV-SMIRNOV, CROSSTAB and KRUSKALL-
- WALLIS).
- - fixed problem with variable label printing beyond the
- listing width (DESCRIPTIVE).
- - Fixed problem with residual plot in regression analysis
- (REGRESSION).
- - Fixed problem with SPSS/PC+ files of more than 500
- variables.
- - Fixed problem with monochrome color scheme (-m switch).
- - Fixed problem with memory management in regression
- procedure (REGRESSION).
-
- 0.93 (beta) 12-15-91
-
- NEW FEATURES AND IMPROVEMENTS
- -----------------------------
- - Added nonlinear regressions (quadratic, cubic, 4th and 5th
- degree polynomial, logarithmic, exponential and inverse)
- (REGRESSION).
- - The F7 key can be used to toggle the printer on and off.
- - The F8 key can be used to toggle the disk log on and off.
- - Improved precision for confidence intervals with small
- degree of freedom (less than 8) (ONEWAY and REGRESION).
- - Added capability to browse through the last analysis even
- if the listing is not saved to disk.
- - The file listing window displays information on files and
- directory (OPEN FILE).
-
- FIXED PROBLEMS
- --------------
- - Fixed problem with the scatterplot scaling (REGRESSION).
- - Fixed problem with confidence intervals other than 95%
- (ONEWAY and REGRESSION).
- - Fixed problem with barchart on string variables
- (FREQUENCIES).
-
- 0.92 (beta) 09-21-91
-
- NEW FEATURES AND IMPROVEMENTS
- -----------------------------
- - Added option for user defined confidence intervals (ONEWAY
- and REGRESSION).
- - The "." caracter in ASCII files is now treated as a missing
- value.
- - Added automatic decimal adjustment for very small numbers.
- - Improved algorithm for choosing scatterplot scaling
- (FREQUENCIES)
-
- FIXED PROBLEMS
- --------------
- - Fixed problem with large number overlapping.
- - Fixed problem with the computation of the mode
- (FREQUENCIES).
- - More typographical errors corrected.
-
- 0.91 (beta) 06-25-91
-
- NEW FEATURES AND IMPROVEMENTS
- -----------------------------
- - Added option to eliminate the beep (I/O OPTION).
- - REGRESSION procedure now includes an anova, standard error
- and confidence interval of the slope and the intercept.
-
- FIXED PROBLEMS
- --------------
- - Fixed problem with printer error message.
- - Fixed problem with reading Lotus file (OPEN FILE).
- - Help file corrected.
-
- 0.90 (beta) 06-14-91 - First public release.
-
-
- DISTRIBUTION
- ------------
-
- Since SIMSTAT 1.00 is a shareware product, you are encouraged to
- experiment with it and share it with your friends as long as the
- following provisions are met:
-
- 1) It is distributed ONLY in its original, unmodified form.
- 2) No fee is charged for copying or distribution without
- permission by the author.
-
- You can contact the author by writing to the following addresses:
-
- By mail:
- Normand Peladeau
- Provalis Research
- 5000, Adam street
- Montreal, QC
- H1V 1W5
-
- By electronic mail via CompuServe:
- Normand Peladeau
- User# [71760,2103]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-